x <- 5 # input the value 5 into variable x
if (x > 0) {
print("x is positive")
}[1] "x is positive"
By the end of this tutorial, you should:
understand how to create conditional statements in R
understand how to create loops in R
understand how to create a custom function in R
In programming, logic and repetition form the core of what makes code meaningful. Being able to decide and act based on conditions, as well as the ability to iterate over tasks repeatedly, provides programs with both intelligence and efficiency. Similarly, the encapsulation of specific tasks into custom functions promotes code reusability and clarity.
This tutorial explores three foundational aspects of the R programming language that help us achieve this: conditional statements, loops, and custom functions.
In programming, conditional statements are used to control the flow of your code based on specific conditions being met. For example, they allow your program to execute different code blocks depending on whether a certain condition is met.
Imagine tailoring a message to the user based on their input, or filtering out data that meets certain criteria. With conditional statements, such customisations become possible.
The most common conditional statements used in R programming are ‘if’, ‘else’, and ’else if’.
The ‘if’ statement evaluates a condition. If the condition is true, it executes the code block within the curly braces ‘{ }’.
In the following example, we evaluate whether x>0 or not and, the condition is met, print a statement to the console:
x <- 5 # input the value 5 into variable x
if (x > 0) {
print("x is positive")
}[1] "x is positive"
The ‘else’ statement is used in conjunction with the ‘if’ statement. When the condition in the ‘if’ statement is false, the code block within the ‘else’ statement is executed:
x <- -3
if (x > 0) {
print("x is positive")
} else {
print("x is non-positive")
}[1] "x is non-positive"
The ’else if’ statement is used when you need to check multiple conditions in a sequence. It allows you to add additional conditions after an initial ‘if’ statement:
x <- 0
if (x > 0) {
print("x is positive")
} else if (x < 0) {
print("x is negative")
} else {
print("x is zero")
}[1] "x is zero"
You can nest these three statements within each other, to create more complex decision-making structures:
x <- 10
y <- 5
if (x > 0) {
if (y > 0) {
print("Both x and y are positive")
} else {
print("x is positive, y is non-positive")
}
} else {
if (y > 0) {
print("x is non-positive, y is positive")
} else {
print("Both x and y are non-positive")
}
}[1] "Both x and y are positive"
Loops are used to execute a block of code repeatedly for a specific number of iterations, or until a certain condition is met. They are important for performing repetitive tasks efficiently as they save writing out the same code multiple times. Loops offer a structured way to handle such repeated tasks without the need for redundant code.
A ‘for’ loop iterates over a sequence, such as a vector, and executes a block of code for each element in the sequence:
# Loop over a vector of numbers and
# for each element of the vector
# print the element multiplied by 2.
numbers <- c(1, 2, 3, 4, 5)
for (num in numbers) {
print(num * 2)
}[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
# Loop over a range of numbers
for (i in 1:5) {
print(i * 3)
}[1] 3
[1] 6
[1] 9
[1] 12
[1] 15
A ‘while’ loop continues to execute a block of code as long as a specified condition is true:
count <- 1
while (count <= 5) {
print(count)
count <- count + 1
}[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
You can nest loops inside each other to create more complex iterations:
# Multiplication table
for (i in 1:10) {
for (j in 1:10) {
cat(paste(i * j, "t"))
}
cat("n")
}1 t2 t3 t4 t5 t6 t7 t8 t9 t10 tn2 t4 t6 t8 t10 t12 t14 t16 t18 t20 tn3 t6 t9 t12 t15 t18 t21 t24 t27 t30 tn4 t8 t12 t16 t20 t24 t28 t32 t36 t40 tn5 t10 t15 t20 t25 t30 t35 t40 t45 t50 tn6 t12 t18 t24 t30 t36 t42 t48 t54 t60 tn7 t14 t21 t28 t35 t42 t49 t56 t63 t70 tn8 t16 t24 t32 t40 t48 t56 t64 t72 t80 tn9 t18 t27 t36 t45 t54 t63 t72 t81 t90 tn10 t20 t30 t40 t50 t60 t70 t80 t90 t100 tn
You can use ‘break’ to exit a loop prematurely when a certain condition is met. You can also use ‘next’ to skip the current iteration and move to the next one:
# An example of 'Break'
for (i in 1:10) {
if (i > 5) {
break
}
print(i)
}[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# An example of 'Next'
for (i in 1:10) {
if (i %% 2 == 0) {
next
}
print(i)
}[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
As noted above, loops are a fundamental concept in R (and most programming languages) for performing repetitive tasks in a systematic and efficient manner. Understanding loops and their applications is essential for writing complex and flexible code.
However, keep in mind that R is optimized for vectorized operations, and it is often more efficient to use built-in R functions like apply, sapply, lapply, or tapply for repetitive tasks rather than writing explicit loops (Section 10.6).
In R, a ‘function’ is a reusable piece of code that takes inputs, processes them, and returns an output.
Writing custom functions in R can help us create more efficient and readable code, minimise repetitive tasks, and improve the ‘re-usability’ of our code. With custom functions, you can encapsulate that logic once and then call upon it wherever needed, ensuring your code is DRY (Don’t Repeat Yourself) and more maintainable.
Functions are really useful if you find yourself doing similar operations on different datasets, which is often the case in sport data analytics.
Functions in R consists of four components:
For example:
function_name <- function(arguments) {
# function body
return(output)
}In this example, we create a function called \verb’add_numbers’ that adds two numbers together:
add_numbers <- function(x, y) {
sum <- x + y
return(sum)
}
result <- add_numbers(5, 10)
print(result)[1] 15
You can set default values for arguments in a custom function. If a value is not provided for an argument with a default value, the default value will be used.
In this example, a function called ’raise_to_power’ is created that raises a number to a power, with a default value of 2 for the power:
raise_to_power <- function(x, power = 2) {
result <- x^power
return(result)
}
print(raise_to_power(4)) # Output: 16 (4^2)[1] 16
print(raise_to_power(4, 3)) # Output: 64 (4^3)[1] 64
The ‘…’ (ellipsis) is used to indicate a variable number of input arguments.
Example: A function that calculates the sum of an arbitrary number of numbers.
sum_numbers <- function(...) {
numbers <- list(...)
total <- sum(unlist(numbers))
return(total)
}
print(sum_numbers(1, 2, 3, 4, 5)) # Output: 15[1] 15
Functions can accept named arguments and return named values.
In this example, a function called ‘rectangle_properties’ is created that calculates the area and perimeter of a rectangle.
rectangle_properties <- function(length, width) {
area <- length * width
perimeter <- 2 * (length + width)
return(list(area = area, perimeter = perimeter))
}
properties <- rectangle_properties(5, 10)
print(properties)$area
[1] 50
$perimeter
[1] 30
To access a function in R, you simply call the function by its name, followed by the input arguments required by the function within parentheses. If the function has default arguments, you can omit them when calling the function, and R will use the default values.
If you do want to provide specific values for the arguments, include them within the parentheses in the order they are defined, separated by commas. You can also use named arguments to specify the values of the input arguments, regardless of their order.
Here’s an example of how to access a custom function in R:
multiply_numbers <- function(x, y) {
result <- x * y
return(result)
}Then, we can access the function by calling it with input arguments:
# Using positional arguments
result1 <- multiply_numbers(5, 10)
print(result1) # Output: 50[1] 50
# Using named arguments
result2 <- multiply_numbers(x = 5, y = 10)
print(result2) # Output: 50[1] 50
# Using named arguments in a different order
result3 <- multiply_numbers(y = 10, x = 5)
print(result3) # Output: 50[1] 50
You already know that to access a function that is part of a package or library outside of base R, you need to load the package first using the ‘library()’ function, and then call the desired function as demonstrated above.
If you want to access (for example) the ‘mean()’ function from the base package (which is loaded by default), you can call it like this:
numbers <- c(1, 2, 3, 4, 5) # We create a vector called 'numbers' and create five elements
average <- mean(numbers) # Now, we call the function 'mean' and pass it the 'numbers' vector
print(average) # This prints the value 3, which is the mean of the elements in the vector[1] 3
‘lapply’ and ‘sapply’ are two of the most commonly used “apply” family functions in R. These functions allow us to perform operations on each element of a list, vector, or other iterable1 data structures in a more efficient and concise manner compared to writing explicit loops.
The lapply function takes a list, vector, or other iterable data structure as input, applies a specified function to each element, and returns a list with the results.
Syntax: lapply(X, FUN, …), where X is the input data structure, FUN is the function to apply, and … are optional additional arguments for FUN.
numbers <- c(1, 2, 3, 4, 5)
squared_numbers <- lapply(numbers, function(x) x^2)The sapply function works similarly to lapply, but it tries to simplify the output into a more convenient data structure, such as a vector or matrix, if possible.
Syntax: sapply(X, FUN, …, simplify = TRUE), where X is the input data structure, FUN is the function to apply, … are optional additional arguments for FUN, and simplify is an optional parameter that controls the output simplification (default is TRUE):
numbers <- c(1, 2, 3, 4, 5)
squared_numbers <- sapply(numbers, function(x) x^2)Use ’lapply’ when you want the output to always be a list, regardless of the input or function applied.
Use ‘sapply’ when you prefer a simplified output, such as a vector or matrix, if possible. Note that if simplification is not possible, sapply will return a list, just like lapply:
list_of_vectors <- list(a = c(1, 2, 3), b = c(4, 5, 6), c = c(7, 8, 9))
sum_list <- lapply(list_of_vectors, sum) # Output is a list
sum_vector <- sapply(list_of_vectors, sum) # Output is a numeric vectorIn the following exercise, you are asked to create a simple custom function. There is a skeleton code provided below to assist you.
Create a function named mult_add that accepts three arguments: num1, num2, and add_val.
The function should multiply num1 and num2 together.
Then, it should add add_val to the result of the multiplication.
Finally, the function should return the final result.
Test your function by calling it with the arguments 5, 6, and 3 (i.e., mult_add(5, 6, 3)). It should return 33, since (5 * 6) + 3 = 33.
Skeleton Code:
mult_add <- function(num1, num2, add_val) {
# Multiply num1 and num2
# Add add_val to the multiplication result
# Return the final result
}Solution:
mult_add <-function(num1, num2, add_val) {
mult_result <- num1*num2
final_result <- mult_result + add_val
return(final_result)
}
print(mult_add(5,6,3))[1] 33
An iterable data structure is a collection of items that can be traversed or looped through, one item at a time. An example would be a list.↩︎